Trying to have another look through our results for anything neurodegeneration-related, either at the level of phenotype or disease.
res <- MultiEWCE::load_example_results()
## Registered S3 method overwritten by 'ggtree':
## method from
## fortify.igraph ggnetwork
res <- HPOExplorer::add_disease(res)
## Annotating phenos with Disease
## Importing existing file: ... phenotype.hpoa
annot <- HPOExplorer::load_phenotype_to_genes(3)
## Importing existing file: ... phenotype.hpoa
First, we do some initial filtering of the results to only get significant associations.
res <- res[q<0.05 & symptoms.pval<0.05,]
AD is not enriched for any phenotype, likely due to the small number of genes included. GWAS results are more comprehensive than this. https://hpo.jax.org/app/browse/term/HP:0002511
ad <- res[grepl("alzheimer",Phenotype,ignore.case = TRUE)]
nrow(ad)
## [1] 0
“Parkinson’s disease” is not in the HPO, but the branch “Parkinsonism” is. https://hpo.jax.org/app/browse/term/HP:0001300
We actually do have 1 sig phenotype here: “Parkinsonism”
pd_phenos <- HPOExplorer::make_phenos_dataframe(ancestor="Parkinsonism")
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 2 descendents.
## Computing gene counts.
## Getting absolute ontology level for 2 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
pd1 <- res[HPO_ID %in% pd_phenos$HPO_ID ]
MultiEWCE::create_dt(pd1)
## Loading required namespace: DT
We can also search for phenotypes via “Parkinson’s” as a disease.
This gives us lots of phenotypes, but many of them are not specific to PD: e.g. Autosomal recessive inheritance, Open mouth, Macrocephaly, Urinary urgency
d <- annot[grepl("parkinson",DiseaseName, ignore.case = TRUE)]
pd <- res[HPO_ID %in% d$HPO_ID]
nrow(pd)
## [1] 9928
Let’s try to narrow it down to very PD-specific phenotypes.
These PD-associated phenotypes only appear once in the HPO annotations. Thus, they must be PD-specific.
That leaves us with just 2 phenotypes.
n_pd_terms <- length(unique(pd_phenos$HPO_ID))
pd_pheno_freqs <- sort(table(annot[HPO_ID %in% d$HPO_ID]$HPO_ID))
hist(pd_pheno_freqs, 50)
pd_specific <- pd[HPO_ID %in% names(pd_pheno_freqs)[pd_pheno_freqs<=1+n_pd_terms]]
MultiEWCE::create_dt(pd_specific)
We can lift this filter a bit and look for PD-associated phenotypes
that only appear in 3 or less non-PD diseases. We see phenotype that can
indeed be associated with PD, but are not necessarily exclusive to it:
e.g.
Abnormal aggressive, impulsive or violent behavior,
Pill-rolling tremor, “Abnormal CSF protein level
pd_specific2 <- pd[HPO_ID %in% names(pd_pheno_freqs)[pd_pheno_freqs<=3+n_pd_terms] ]
MultiEWCE::create_dt(pd_specific2)
Get all the phenos from the Mental deterioration branch. This includes many forms of dementia https://hpo.jax.org/app/browse/term/HP:0001268
As we’ve seen before, only the higher-level term “Mental deterioration” remains significant. Perhaps due to the greater number of genes in this higher level.
phenos <- HPOExplorer::make_phenos_dataframe(ancestor="Mental deterioration")
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 14 descendents.
## Computing gene counts.
## Getting absolute ontology level for 14 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
md <- res[HPO_ID %in% phenos$HPO_ID]
MultiEWCE::create_dt(md)
HPO has the term: Atrophy/Degeneration affecting the central nervous system https://hpo.jax.org/app/browse/term/HP:0007367
In our results, we found a number of significantly enriched phenotypes that fall under that branch:
phenos <- HPOExplorer::make_phenos_dataframe(ancestor="Atrophy/Degeneration affecting the central nervous system")
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 41 descendents.
## Computing gene counts.
## Getting absolute ontology level for 41 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
at <- res[HPO_ID %in% phenos$HPO_ID]
MultiEWCE::create_dt(at)
Next, let’s find examples of celltype-specific enrichment results that relate to diseases/phenotypes in the categories of dementia/neurodegeneration vs. diabetes/obesity.
queries <- paste("alzheimer","parkinson","diabetes","obesity",sep = "|")
phenos <- rbind(
HPOExplorer::make_phenos_dataframe(ancestor="Atrophy/Degeneration affecting the central nervous system"),
HPOExplorer::make_phenos_dataframe(ancestor="Mental deterioration"),
HPOExplorer::make_phenos_dataframe(ancestor="Diabetes mellitus"),
HPOExplorer::make_phenos_dataframe(ancestor="Obesity")
)
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 41 descendents.
## Computing gene counts.
## Getting absolute ontology level for 41 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 14 descendents.
## Computing gene counts.
## Getting absolute ontology level for 14 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 11 descendents.
## Computing gene counts.
## Getting absolute ontology level for 11 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
## Importing existing file: ... phenotype_to_genes.txt
## Extracting data for 4 descendents.
## Computing gene counts.
## Getting absolute ontology level for 4 HPO IDs.
## Computing ontology level / gene count ratio.
## Adding information_content scores.
## Adding term definitions.
## Adding level-3 ancestor to each HPO ID.
## Making hoverboxes from: 'Phenotype', 'HPO_ID', 'ontLvl', 'ontLvl_geneCount_ratio', 'definition', 'ancestor', 'ancestor_name'
res_inter <- res[grepl(queries,Phenotype,ignore.case = TRUE) |
grepl(queries,DiseaseName,ignore.case = TRUE) |
HPO_ID %in% phenos$HPO_ID]
message(formatC(nrow(res_inter),big.mark = ",")," results remain.")
## 511 results remain.
We still have a lot of results left (500+ rows), which is a bit much to visualize. So let’s do some additional filtering.
res_filt <- MultiEWCE::prioritise_targets(results = res_inter,
keep_tiers = NULL,
severity_threshold = NULL,
keep_onsets = NULL,
keep_deaths = NULL,
pheno_ndiseases_threshold = NULL,
symptom_p_threshold = NULL,
symptom_intersection_size_threshold = 2)
## Prioritising gene targets.
## Adding HPO IDs.
## Importing existing file: ... phenotype_to_genes.txt
## Translating all phenotypes to HPO IDs.
## + Returning a vector of phenotypes (same order as input).
## Adding term definitions.
## Prioritised targets: step='start'
## - Rows: 511
## - Phenotypes: 96
## - Diseases: 71
## - Cell types: 32
## Filtering @ q-value <= 0.05
## Prioritised targets: step='q_threshold'
## - Rows: 511
## - Phenotypes: 96
## - Diseases: 71
## - Cell types: 32
## Filtering @ fold-change >= 1
## Prioritised targets: step='fold_threshold'
## - Rows: 511
## - Phenotypes: 96
## - Diseases: 71
## - Cell types: 32
## Adding level-3 ancestor to each HPO ID.
## Removing remove descendants of: 'Clinical course'
## Translating all phenotypes to HPO IDs.
## + Returning a dictionary of phenotypes (different order as input).
## Prioritised targets: step='remove_descendants'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Getting absolute ontology level for 95 HPO IDs.
## Prioritised targets: step='keep_ont_levels'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Annotating phenos with Onset.
## Prioritised targets: step='keep_onsets'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Annotating phenos with AgeOfDeath
## Prioritised targets: step='keep_deaths'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Annotating phenos with Tiers.
## Prioritised targets: step='keep_tiers'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Annotating phenos with Modifiers
## Prioritised targets: step='severity_threshold'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Annotating phenos with n_diseases
## Importing existing file: ... phenotype_to_genes.txt
## Importing existing file: ... genes_to_phenotype.txt
## Importing existing file: ... phenotype.hpoa
## Prioritised targets: step='pheno_ndiseases_threshold'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Annotating phenotype frequencies.
## Prioritised targets: step='pheno_frequency_threshold'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Prioritised targets: step='symptom_p_threshold'
## - Rows: 510
## - Phenotypes: 95
## - Diseases: 71
## - Cell types: 32
## Prioritised targets: step='symptom_intersection_size_threshold'
## - Rows: 501
## - Phenotypes: 87
## - Diseases: 65
## - Cell types: 30
## 21 / 30 of cell types kept.
## Prioritised targets: step='keep_celltypes'
## - Rows: 372
## - Phenotypes: 81
## - Diseases: 56
## - Cell types: 21
## Filtering by gene size.
## Converting phenos to GRanges.
## Translating all phenotypes to HPO IDs.
## + Returning a vector of phenotypes (same order as input).
## Loading required namespace: ensembldb
## Gathering gene metadata
## Loading required namespace: EnsDb.Hsapiens.v75
## Prioritised targets: step='keep_seqnames'
## - Rows: 41,687
## - Phenotypes: 81
## - Genes: 3,739
## 246 / 3,739 genes kept.
## Prioritised targets: step='gene_size'
## - Rows: 2,315
## - Phenotypes: 79
## - Genes: 246
## Prioritised targets: step='keep_biotypes'
## - Rows: 2,315
## - Phenotypes: 79
## - Genes: 246
## Filtering by mean_exp_quantile.
## Annotating gene frequencies.
## Importing existing file: ... genes_to_phenotype.txt
## Prioritised targets: step='gene_frequency_threshold'
## - Rows: 11,393
## - Phenotypes: 79
## - Diseases: 56
## - Cell types: 21
## - Genes: 246
## Prioritised targets: step='keep_specificity_quantiles'
## - Rows: 11,393
## - Phenotypes: 79
## - Diseases: 56
## - Cell types: 21
## - Genes: 246
## Prioritised targets: step='keep_mean_exp_quantiles'
## - Rows: 8,851
## - Phenotypes: 79
## - Diseases: 56
## - Cell types: 21
## - Genes: 200
## Prioritised targets: step='symptom_gene_overlap'
## - Rows: 148
## - Phenotypes: 34
## - Diseases: 16
## - Cell types: 7
## - Genes: 13
## Sorting rows.
## Prioritised targets: step='end'
## - Rows: 148
## - Phenotypes: 34
## - Diseases: 16
## - Cell types: 7
## - Genes: 13
vn <- MultiEWCE::prioritise_targets_network(
top_targets = res_filt$top_targets,
submain = "dementia/neurodegeneration & diabetes/obesity",
save_path = here::here("networks/dementia_diabetes_network.html"),
show_plot = FALSE)
## Loading required namespace: pals
## Loading required namespace: igraph
## Loading required namespace: tidygraph
## Creating network.
## Loading required namespace: visNetwork
## Creating plot.
## Saving plot ==> /Users/schilder/Desktop/ewce/RareDiseasePrioritisation/networks/dementia_diabetes_network.html
vn$plot
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.3 R.utils_2.12.2
## [3] tidyselect_1.2.0 RSQLite_2.3.0
## [5] AnnotationDbi_1.60.2 htmlwidgets_1.6.1
## [7] grid_4.2.1 BiocParallel_1.32.5
## [9] munsell_0.5.0 codetools_0.2-19
## [11] DT_0.27 colorspace_2.1-0
## [13] Biobase_2.58.0 filelock_1.0.2
## [15] highr_0.10 knitr_1.42
## [17] rstudioapi_0.14 orthogene_1.4.1
## [19] stats4_4.2.1 SingleCellExperiment_1.20.0
## [21] ggsignif_0.6.4 MatrixGenerics_1.10.0
## [23] GenomeInfoDbData_1.2.9 bit64_4.0.5
## [25] rprojroot_2.0.3 coda_0.19-4
## [27] vctrs_0.6.0 treeio_1.23.1
## [29] generics_0.1.3 xfun_0.37
## [31] BiocFileCache_2.6.1 R6_2.5.1
## [33] GenomeInfoDb_1.34.9 pals_1.7
## [35] AnnotationFilter_1.22.0 bitops_1.0-7
## [37] cachem_1.0.7 gridGraphics_0.5-1
## [39] DelayedArray_0.24.0 promises_1.2.0.1
## [41] BiocIO_1.8.0 scales_1.2.1
## [43] gtable_0.3.1 tidygraph_1.2.3
## [45] ontologyPlot_1.6 ensembldb_2.22.0
## [47] rlang_1.1.0 rtracklayer_1.58.0
## [49] rstatix_0.7.2 lazyeval_0.2.2
## [51] dichromat_2.0-0.1 broom_1.0.4
## [53] BiocManager_1.30.20 yaml_2.3.7
## [55] reshape2_1.4.4 HPOExplorer_0.99.7
## [57] abind_1.4-5 GenomicFeatures_1.50.4
## [59] ggnetwork_0.5.12 crosstalk_1.2.0
## [61] backports_1.4.1 httpuv_1.6.9
## [63] tools_4.2.1 ggplotify_0.1.0
## [65] statnet.common_4.8.0 ggplot2_3.4.1
## [67] ellipsis_0.3.2 gplots_3.1.3
## [69] jquerylib_0.1.4 paintmap_1.0
## [71] RColorBrewer_1.1-3 BiocGenerics_0.44.0
## [73] Rcpp_1.0.10 plyr_1.8.8
## [75] visNetwork_2.1.2 progress_1.2.2
## [77] zlibbioc_1.44.0 purrr_1.0.1
## [79] RCurl_1.98-1.10 prettyunits_1.1.1
## [81] ggpubr_0.6.0 S4Vectors_0.36.2
## [83] SummarizedExperiment_1.28.0 grr_0.9.5
## [85] here_1.0.1 magrittr_2.0.3
## [87] data.table_1.14.8 ProtGenerics_1.30.0
## [89] matrixStats_0.63.0 hms_1.1.2
## [91] patchwork_1.1.2 mime_0.12
## [93] evaluate_0.20 xtable_1.8-4
## [95] XML_3.99-0.13 EWCE_1.7.2
## [97] IRanges_2.32.0 MultiEWCE_0.1.4
## [99] compiler_4.2.1 biomaRt_2.54.0
## [101] maps_3.4.1 tibble_3.2.0
## [103] KernSmooth_2.23-20 crayon_1.5.2
## [105] R.oo_1.25.0 htmltools_0.5.4
## [107] ggfun_0.0.9 later_1.3.0
## [109] tidyr_1.3.0 aplot_0.1.10
## [111] DBI_1.1.3 ExperimentHub_2.6.0
## [113] gprofiler2_0.2.1 dbplyr_2.3.1
## [115] rappdirs_0.3.3 babelgene_22.9
## [117] EnsDb.Hsapiens.v75_2.99.0 Matrix_1.5-3
## [119] car_3.1-1 piggyback_0.1.4
## [121] cli_3.6.0 R.methodsS3_1.8.2
## [123] igraph_1.4.1 parallel_4.2.1
## [125] GenomicRanges_1.50.2 pkgconfig_2.0.3
## [127] GenomicAlignments_1.34.1 plotly_4.10.1
## [129] xml2_1.3.3 ggtree_3.6.2
## [131] bslib_0.4.2 XVector_0.38.0
## [133] GeneOverlap_1.34.0 yulab.utils_0.0.6
## [135] stringr_1.5.0 digest_0.6.31
## [137] graph_1.76.0 Biostrings_2.66.0
## [139] rmarkdown_2.20.1 HGNChelper_0.8.1
## [141] tidytree_0.4.2 restfulr_0.0.15
## [143] curl_5.0.0 shiny_1.7.4
## [145] Rsamtools_2.14.0 gtools_3.9.4
## [147] rjson_0.2.21 lifecycle_1.0.3
## [149] nlme_3.1-162 jsonlite_1.8.4
## [151] carData_3.0-5 network_1.18.1
## [153] mapproj_1.2.11 viridisLite_0.4.1
## [155] limma_3.54.2 fansi_1.0.4
## [157] pillar_1.8.1 ontologyIndex_2.10
## [159] lattice_0.20-45 homologene_1.4.68.19.3.27
## [161] KEGGREST_1.38.0 fastmap_1.1.1
## [163] httr_1.4.5 interactiveDisplayBase_1.36.0
## [165] glue_1.6.2 RNOmni_1.0.1
## [167] png_0.1-8 ewceData_1.7.1
## [169] BiocVersion_3.16.0 bit_4.0.5
## [171] Rgraphviz_2.42.0 stringi_1.7.12
## [173] sass_0.4.5 blob_1.2.3
## [175] AnnotationHub_3.6.0 caTools_1.18.2
## [177] memoise_2.0.1 dplyr_1.1.0
## [179] ape_5.7-1